library("dplyr")
library("Seurat")
library("knitr")
library("ggplot2")
library("BiocManager")
library("here")
#BiocManager::install("EnhancedVolcano")
library("EnhancedVolcano") #volcano plot
#install.packages('DESeq2') #for DEG
library("DESeq2")
library("tidyverse") #tidy up data
library("styler") #tidy up data
library("scCustomize") #for color scales)
library("qs") #for file compression
library("sf") #for spatial
library("rstatix") # For stats
library("FNN")


if (!require("kableExtra")) {install.packages("kableExtra"); require("kableExtra")} # for color brewer
if (!require("RColorBrewer")) {install.packages("RColorBrewer"); require("RColorBrewer")} # for color brewer
if (!require("sctransform")) {install.packages("sctransform"); require("sctransform")} # for data normalization
if (!require("glmGamPoi")) {BiocManager::install('glmGamPoi'); require("glmGamPoi")} # for data normalization, sctransform
if (!require("cowplot")) {install.packages("cowplot"); require("cowplot")} # for figure layout
if (!require("patchwork")) {install.packages("patchwork"); require("patchwork")} # for figure patching
if (!require("openxlsx")) {install.packages("openxlsx"); require("openxlsx")} # to save .xlsx files

# install.packages("styler")

set.seed(12345)
# here()

1 Welcome

Welcome to the Single-Cell Omics Research and Education Club!

If this is your time to the club, I want to extend and extra-special welcome to you!

I’m Jonathan Nelson, an Assistant Professor at the University of Southern California. I’m a wet scientist turned wet+dry scientist. I’ve been working with single-cell RNAseq data for the past 5 years and I’m excited to share what I’ve learned with you.

1.1 SCORE Values

1.1.1 Learning

We believe that bioinformatics is a constantly evolving field, and that ongoing learning and professional development is essential to staying up-to-date. We encourage members to share their knowledge and experiences with each other, and to seek out opportunities for continued learning.

1.1.2 Accessibility

We believe that access to bioinformatics support should be available to everyone. We strive to create a welcoming and inclusive environment where all members can feel comfortable asking for help and contributing to the group.

1.1.3 Collaboration

We believe that working together is key to achieving success in bioinformatics. We value the diversity of perspectives and backgrounds that each member brings, and we encourage open communication and the sharing of ideas.

1.1.4 Integrity

We believe in conducting ourselves with honesty and professionalism in all our interactions. We hold ourselves to high ethical standards and respect the privacy and confidentiality of all members.

1.1.5 Empathy

We believe in approaching each other with empathy and kindness. We understand that bioinformatics can be a challenging and sometimes frustrating field, and we strive to support each other through these difficulties.

1.2 Context and Expectations

I know a lot of this has been going on in the background for everyone and I wanted to bring it to the forefront. My expectation is that we have 4-6 more meetings about spatial transcriptomics…and then we’ll re-evaluate.

Email me you would like me to add anyone:

Today’s code (this html file) will be posted to the SCORE website (https://usckrc.github.io/website/score.html)

2 The Agenda!

2.1 Music and Memes

2.2 New Package: sf

2.3 Main Theme: Going Spatial



3 Music and Memes

3.1 This Months Coding Music

3.1.1 Odesza: My Friends Never Die

Link to Album

3.1.2 “Without You” Hits (Love the Sampling of Gotye)

3.1.2.1 Bonus for watching the 1988 Remix of Somebody That I Used To Know

3.1.2.1.1 I really thought the song was from 1988 after watching this video

3.2 The Memes

3.2.1 Meme 1: The PhD Journey

3.2.2 Meme 2: Why We Do It!

3.2.3 Meme 3: An Inverse Relationship For Dude Scientists

3.2.4 Meme 4: What We Call Each Other!



4 New Package: sf

4.1 Thank you Thomas Kowal!

4.2 Simple Features Package

4.4 Was able to Vibe-code with it pretty well!

4.5 We’ll use this package for the Main Theme!

5 Main Theme: Going Spatial

5.1 Series on Spatial Transcriptomics


5.1.0.1 See SCORE #6 for Background on Spatial Transcriptomics

https://usckrc.github.io/website/SCORE_6__LZ_ST_ppt.pdf

5.1.0.2 See SCORE #7 for Starting with a Xenium Dataset

https://usckrc.github.io/website/SCORE_7.html

5.1.0.3 See SCORE #8 for Annotating a Xenium Dataset

https://usckrc.github.io/website/SCORE_8.html

5.2 This is a new way of thinking for me

5.2.1 Geography

5.2.2 Coordinates

5.3 There are coordinates associated witih your xenium dataset

5.4 Every cell has a centroid coordinate associated with it.

5.4.1 outs>cells.csv.gz

Contains: cell_id // x_centroid // y_centroid // transcript_counts // control_probe_counts // genomic_control_counts // control_codeword_counts // unassigned_codeword_counts // deprecated_codeword_counts // total_counts // cell_area // nucleus_area // nucleus_count // segmentation_method

5.4.1.1 I add centroids into the meta.data of my xenium object

df <- read.csv(here("Outs_DAPI expansion", "cells.csv.gz"))

df <- df %>%
  column_to_rownames(var = colnames(df)[1])

xenium.obj <- AddMetaData(xenium.obj, df)

head(xenium.obj@meta.data)

5.4.2 What can you do with centroids?

5.4.3 Questions that I’ve tried to answer recently:

5.4.3.1 Q1: How far away are cells from a point?

5.4.3.2 Q2: Can I plot cells along the cortical -> medullary gradient?

5.4.3.3 Q3: Can I identify genes whose expression tracks with the cortical -> medullary gradient?

5.4.3.4 Q4: Is the geographic distribution of a population different from another?

5.4.3.5 Q5: What cells are most likely to be next to another?

5.5 Q1: How far away are cells from a point?

5.5.1 Load Xenium Object

5.5.1.1 Check that cell coordinates are in meta.data.

xenium.obj <- qread(here("Week 9 Going Spatial", "region1_xenium.qs"))

xenium.obj
## An object of class Seurat 
## 18529 features across 155070 samples within 6 assays 
## Active assay: SCT (5106 features, 3000 variable features)
##  3 layers present: counts, data, scale.data
##  5 other assays present: Xenium, BlankCodeword, ControlCodeword, ControlProbe, GenomicControl
##  2 dimensional reductions calculated: pca, umap
##  1 spatial field of view present: fov
head(xenium.obj@meta.data)

5.5.2 Identify the Center Point

5.5.2.1 I did this manually in Xenium Explorer

5.5.2.2 Make sure that you have the “Cells” toggled on.

5.5.3 Set Center Corrdinates

x_center <- 3320.59
y_center <-  2731.66

5.5.4 Create an sf object from meta.data

md <- xenium.obj@meta.data

xenium_sf <- st_as_sf(
  md,
  coords = c("x_centroid", "y_centroid"),
  crs = NA   # Xenium coordinates are typically in microns
)

head(xenium_sf) 

5.5.5 Set the center in your sf object

crs = coordinate reference system: assigns the same coordinate reference system as your existing xenium_sf object.

center_sf <- st_sfc(
  st_point(c(x_center, y_center)),
  crs = st_crs(xenium_sf)
)

5.5.6 Measure distance from center for every cell

xenium_sf$dist_from_center <- as.numeric(
  st_distance(xenium_sf, center_sf)
)

5.5.7 Add distance from center to xenium object meta.data

xenium.obj$dist_from_center <- xenium_sf$dist_from_center

head(xenium.obj@meta.data)

5.5.8 Plot Distance From Center

f1 <- DimPlot(xenium.obj, label = TRUE) +
  theme_void() +
  NoLegend()

f2 <- FeaturePlot(xenium.obj, features = "dist_from_center") +
  theme_void() +
  ggtitle("Distance From Center in Microns") +
  theme(
    plot.title = element_text(hjust = 0.5)  # center the title
  )

f1 + f2

5.5.9 Let’s Explore This By Cell Type

5.5.10 Pull meta.data and set bin width (100um)

md <- xenium.obj@meta.data

bin_width <- 100
md <- md %>%
  mutate(
    dist_bin = floor(dist_from_center / bin_width) + 1
  )
freq_df <- md %>%
  dplyr::count(subclass, dist_bin, seurat_clusters) %>%
  group_by(subclass) %>%
  mutate(
    freq_norm = n / sum(n)
  ) %>%
  ungroup()

ggplot(freq_df,
       aes(x = dist_bin, y = freq_norm)) +
  geom_col(fill = "grey40") +
  facet_wrap(~ subclass, scales = "free_y") +
  theme_classic()

5.5.11 What about by Seurat Clusters (subgroup analysis)

5.5.11.1 Identify sub-populations that might be geographically distinct

ggplot(freq_df,
       aes(x = dist_bin, y = freq_norm, fill = seurat_clusters)) +
  geom_col() +
  facet_wrap(~ subclass, scales = "free_y") +
  theme_classic()

5.5.12 But there is a BIG Problem!

5.5.13 Kidney anatomy isn’t a circle!

5.5.14 There is a gradient

5.6 Q2: Can I plot cells on a the cortical -> medullary gradient?

5.6.1 So I went to Xenium Explorer and added custom regional annotations

5.6.1.1 Probably spent more time being precise than I needed to.

5.6.1.2 Save annotations as GeoJASN files (very helpful)

5.6.1.2.1 annotation_all.geojson

5.6.2 Then save list of cells within each annotations

5.6.2.1 Cortex_cells_stats.csv

5.6.2.2 Inner_Medulla_cells_stats.csv

5.6.2.3 Papilla_cells_stats.csv

5.6.2.4 Fat_cells_stats.csv

5.6.3 Import Xenium Explorer Annotations into Seurat Object

pap <- read_csv("Papilla_cells_stats.csv", skip = 2) %>%
  as.data.frame() %>%
  column_to_rownames(var = colnames(.)[1]) %>%
  mutate(region = "papilla") %>%
  select(region)

xenium.obj <- AddMetaData(xenium.obj, pap)

med  <- read_csv("Inner_Medulla_cells_stats.csv", skip = 2) %>%
  as.data.frame() %>%
  column_to_rownames(var = colnames(.)[1]) %>%
  mutate(region = "inner_medulla") %>%
  select(region)

xenium.obj <- AddMetaData(xenium.obj, med)

cor  <- read_csv("Cortex_cells_stats.csv", skip = 2) %>%
  as.data.frame() %>%
  column_to_rownames(var = colnames(.)[1]) %>%
  mutate(region = "cortex") %>%
  select(region)

xenium.obj <- AddMetaData(xenium.obj, cor)

fat  <- read_csv("Fat_cells_stats.csv", skip = 2) %>%
  as.data.frame() %>%
  column_to_rownames(var = colnames(.)[1]) %>%
  mutate(region = "fat") %>%
  select(region)

xenium.obj <- AddMetaData(xenium.obj, fat)

xenium.obj@meta.data <- xenium.obj@meta.data %>%
  mutate(region = ifelse(is.na(region), "outer_medulla", region))

xenium.obj@meta.data$region <- factor(
  xenium.obj@meta.data$region,
  levels = c("cortex", "outer_medulla", "inner_medulla", "papilla", "fat")
)

5.6.4 Visualize Cells by Region

f1 <- DimPlot(xenium.obj, label = TRUE) +
  theme_void() +
  NoLegend()

f2 <- DimPlot(xenium.obj, group.by = "region") +
  theme_void() +
  ggtitle("Cells Colored by Region") +
  theme(
    plot.title = element_text(hjust = 0.5)  # center the title
  )

f1 + f2

head(xenium.obj@meta.data)

5.6.5 Proportion of Cells by Region

df <- xenium.obj@meta.data %>%
  select(region, subclass) %>%
  filter(!is.na(region), !is.na(subclass))

prop_df <- df %>%
  dplyr::count(subclass, region) %>%
  group_by(subclass) %>%
  mutate(prop = n / sum(n)) %>%
  ungroup()


ggplot(prop_df, aes(x = subclass, y = prop, fill = region)) +
  geom_col(position = "stack") +
  labs(y = "Proportion of cells", x = "Subclass") +
  theme_classic() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    axis.title = element_text(face = "bold")
  )

5.6.6 But I still wanted to measure a distance!

5.6.6.1 After A LOT of thinking…this is what I ended up with

5.6.7 Two Axis System

Negative = Cortex -> 0 = Medulla -> Positive = Papilla

5.6.7.1 Key Line of Code

`xenium_sf <- xenium_sf %>%
  mutate(
    dist_from_center2 = ifelse(region == "papilla",
                                -dist_from_center,
                                dist_from_center)
  )`
xenium.obj@meta.data
x_center <- 3320.59
y_center <-  2731.66

md <- xenium.obj@meta.data

xenium_sf <- st_as_sf(
  md,
  coords = c("x_centroid", "y_centroid"),
  crs = NA   # Xenium coordinates are typically in microns
)

center_sf <- st_sfc(
  st_point(c(x_center, y_center)),
  crs = st_crs(xenium_sf)
)

xenium_sf$dist_from_center <- as.numeric(
  st_distance(xenium_sf, center_sf)
)

xenium_sf <- xenium_sf %>%
  mutate(
    dist_from_center2 = ifelse(region == "papilla",
                                dist_from_center,
                                -dist_from_center)
  )

xenium.obj$dist_from_center2 <- xenium_sf$dist_from_center2
f1 <- DimPlot(xenium.obj, label = TRUE) +
  theme_void() +
  NoLegend()

f2 <- FeaturePlot(xenium.obj, features = "dist_from_center2") +
  theme_void() +
  labs(
    title = "Distance From Center in Microns",
    subtitle = "Cortex (neg) to Medulla (0) to Papilla (pos)"
  ) +
  theme(
    plot.title = element_text(hjust = 0.5),
    plot.subtitle = element_text(hjust = 0.5)  # center subtitle too
  )

f1 + f2

md <- xenium.obj@meta.data

bin_width <- 100

md <- md %>%
  mutate(
    dist_bin = floor(dist_from_center2 / bin_width) + 1
  )
freq_df <- md %>%
  dplyr::count(subclass, dist_bin, seurat_clusters, region) %>%
  group_by(subclass) %>%
  mutate(
    freq_norm = n / sum(n)
  ) %>%
  ungroup()

ggplot(freq_df,
       aes(x = dist_bin, y = freq_norm)) +
  geom_col(fill = "grey40") +
  facet_wrap(~ subclass, scales = "free_y") +
  theme_classic()

5.6.7.2 Identify sub-populations that might be geographically distinct

ggplot(freq_df,
       aes(x = dist_bin, y = freq_norm, fill = seurat_clusters)) +
  geom_col() +
  facet_wrap(~ subclass, scales = "free_y") +
  theme_classic()

5.6.8 Two populations where this change made a big difference

5.7 Q3: Can I indentify genes whose expression tracks with the cortical -> medullary gradient?

5.7.1 The Principal Cells Are Located from Cortex to Papilla!

5.7.2 Principal Cells in Xenium Explorer

xenium_pc <- subset(xenium.obj, subset = subclass == "PC")

freq_df %>%
  filter(subclass == "PC") %>%
  ggplot(aes(x = dist_bin, y = freq_norm, fill = region)) +
  geom_col() +
  theme_classic() +
  scale_x_continuous(
    breaks = c(-30, 0, 20),                 # positions for the labels
    labels = c("Cortex", "Medulla", "Papilla")  # labels
  ) +
  xlab(NULL)  # optional: remove default x-axis title

5.7.3 Pull Assay Data and Distances from meta.data

DefaultAssay(xenium_pc) <- "SCT"

# Expression matrix
expr_mat <- GetAssayData(xenium_pc, slot = "data")

# Distance vector
dist_vec <- xenium_pc@meta.data$dist_from_center2

# Remove NA cells
keep_cells <- !is.na(dist_vec)
expr_mat <- expr_mat[, keep_cells]
dist_vec <- dist_vec[keep_cells]

5.7.4 Correlate Gene Expression with Distance

Using cor.test function

cor_results <- apply(expr_mat, 1, function(gene_expr) {
  test <- cor.test(gene_expr, dist_vec, method = "spearman")
  c(
    rho = test$estimate,
    pval = test$p.value
  )
})

cor_results <- as.data.frame(t(cor_results))
cor_results$gene <- rownames(cor_results)

# FDR correction
cor_results$padj <- p.adjust(cor_results$pval, method = "BH")

cor_results <- cor_results %>%
  arrange(padj)
# 4 genes higher in cortex (negative rho)
cortex_high <- cor_results %>%
  arrange(rho.rho) %>%
  dplyr::slice(1:4) %>%
  pull(gene)

# 4 genes lower in cortex (positive rho)
cortex_low <- cor_results %>%
  arrange(desc(rho.rho)) %>%
  dplyr::slice(1:4) %>%
  pull(gene)

print("Higher in Cortex")
## [1] "Higher in Cortex"
cortex_high
## [1] "Ldhb"   "Scnn1a" "Scnn1g" "Txnip"
print("Lower in Cortex")
## [1] "Lower in Cortex"
cortex_low
## [1] "Akr1b3"  "Hsph1"   "Aldh1a3" "Cryab"

5.7.5 Plot Top 4 Cortical and Medullary Genes

rho +1 = expression increases with distance rho -1 = expression decreases with distance

genes_to_plot <- c(cortex_low, cortex_high)

genes_to_plot <- intersect(genes_to_plot, rownames(expr_mat))

DefaultAssay(xenium_pc) <- "SCT"

dist_vec   <- as.numeric(as.character(xenium_pc$dist_from_center2))
region_vec <- xenium_pc$region

expr_subset <- expr_mat[genes_to_plot, , drop = FALSE]

plot_df <- as.data.frame(t(as.matrix(expr_subset)))
plot_df$distance <- dist_vec
plot_df$region   <- region_vec

plot_df_long <- plot_df %>%
  pivot_longer(
    cols = all_of(genes_to_plot),
    names_to = "gene",
    values_to = "expression"
  )

plot_df_long$gene <- factor(
  plot_df_long$gene,
  levels = genes_to_plot
)

rho_df <- cor_results %>%
  dplyr::filter(gene %in% genes_to_plot) %>%
  dplyr::select(gene, rho = rho.rho) %>%  # rename for convenience
  dplyr::mutate(
    label = paste0("ρ = ", round(rho, 2))  # format nicely
  )

ggplot(plot_df_long, aes(x = distance, y = expression, color = region)) +
  geom_point(alpha = 0.4, size = 0.6) +
  geom_smooth(method = "loess", se = FALSE, color = "black") +
  facet_wrap(~ gene, scales = "free_y", ncol = 4) +
  scale_x_continuous(
    breaks = c(-2500,  1500),
    labels = c("Cortex", "Papilla")
  ) +
  theme_classic() +
  labs(
    x = "",
    y = "SCT Expression"
  ) +
  theme(
    strip.text = element_text(face = "bold"),
    legend.position = "right"
  ) +
  guides(color = guide_legend(override.aes = list(size = 4))) +
 geom_text(
    data = rho_df,
    aes(x = 20, y = Inf, label = label),  # top-right of each facet
    inherit.aes = FALSE,
    hjust = .5,
    vjust = 1.5
  )

5.7.6 Ldhb (Cortex)

5.7.7 Akr1b3 (Medulla)

5.7.8 Could you have found these genes by FindMarkers between Cortex and Papilla?

5.7.8.1 Nope!

Idents(xenium_pc) <- xenium_pc@meta.data$region

df2 <- FindMarkers(xenium_pc, ident.1 = "cortex")

df2 <- df2 %>%
  arrange(desc(avg_log2FC))

head(df2,10)
tail(df2,10)

5.8 Q4: Is the geographic distribution of a population different from another?

ggplot(freq_df,
       aes(x = dist_bin, y = freq_norm, fill = region)) +
  geom_col() +
  facet_wrap(~ subclass, scales = "free_y") +
  theme_classic()

5.8.1 Using Kruskal–Wallis followed by Dunn’s post hoc pairwise comparison

5.8.1.1 non-parametric alternative to one-way ANOVA

5.8.1.2 Compares each pair of groups

5.8.1.3 Uses rank-based comparisons

5.8.1.4 Corrects for multiple comparisons (Holm)

subclass_order <- c(
  "PTS1", "PTS2", "PTS3", "DTL", "TALA", "TALB", "MD", "DCT1", "DCT2",
  "CNT", "PC", "ICA", "ICB", "PODO", "PEC", "URO", "EC", "FIB", "MES",
  "VSMC", "LYMPHO", "MACRO", "FAT"
)

xenium_df_clean <- xenium_sf %>%
  st_drop_geometry() %>%                  # if starting from sf object
  select(dist_from_center2, subclass) %>%
  filter(!is.na(dist_from_center2), !is.na(subclass)) %>%
  group_by(subclass) %>%
  filter(n() >= 2) %>%                    # keep only groups with >=2 cells
  ungroup() %>%
  mutate(subclass = factor(subclass, levels = subclass_order))

xenium_df_clean2 <- xenium_df_clean %>%
  filter(subclass != "PEC") %>%
  droplevels()

# Dunn with Holm correction
dunn_res <- xenium_df_clean2 %>%
  dunn_test(dist_from_center2 ~ subclass,
            p.adjust.method = "holm") 

groups <- levels(xenium_df_clean2$subclass)

full_grid <- expand.grid(
  group1 = groups,
  group2 = groups
)

dunn_complete <- full_grid %>%
  left_join(dunn_res, by = c("group1", "group2"))

dunn_complete <- dunn_complete %>%
  mutate(
    p.adj = ifelse(group1 == group2, 1, p.adj)
  )

dunn_mirror <- dunn_complete %>%
  bind_rows(
    dunn_complete %>%
      transmute(group1 = group2,
                group2 = group1,
                p.adj)  # include any other needed columns
  ) %>%
  distinct(group1, group2, .keep_all = TRUE)

dunn_mirror <- dunn_mirror %>%
  mutate(
    group1 = factor(group1, levels = groups),
    group2 = factor(group2, levels = groups)
  )

dunn_mirror <- dunn_mirror %>%
  mutate(
    log_p = -log10(p.adj),
    log_p = ifelse(
      is.infinite(log_p),
      max(log_p[is.finite(log_p)]),
      log_p
    )
  )

ggplot(dunn_mirror, aes(group1, group2, fill = log_p)) +
  geom_tile() +
  scale_fill_gradient(low = "white", high = "red") +
  theme_minimal() +
  labs(x = NULL, y = NULL) +   # remove axis titles
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1, face = "bold"),
    axis.text.y = element_text(face = "bold")
  )

5.9 Q5: What cells are most likely to be next to another?

5.9.1 Positive Control: PODO

df <- xenium.obj@meta.data

# ------------------------------
# 0. Define query cell type
# ------------------------------
cell_type <- "PODO"  # Change as needed

xenium_pc <- subset(xenium.obj, subset = subclass == cell_type)

# ------------------------------
# 1. Prepare SF coordinates
# ------------------------------
df$x_centroid <- as.numeric(as.character(df$x_centroid))
df$y_centroid <- as.numeric(as.character(df$y_centroid))
df$id <- seq_len(nrow(df))
cells_sf <- st_as_sf(df, coords = c("x_centroid", "y_centroid"), crs = NA)

# ------------------------------
# 2. Query and reference cells
# ------------------------------
query_cells <- cells_sf %>% filter(subclass == cell_type)
reference_cells <- cells_sf %>% filter(subclass != cell_type)

# Extract coordinates
query_coords <- do.call(rbind, st_geometry(query_cells)) %>% as.matrix()
ref_coords <- do.call(rbind, st_geometry(reference_cells)) %>% as.matrix()

# ------------------------------
# 3. Nearest neighbor search (k = 1)
# ------------------------------
nn <- get.knnx(ref_coords, query_coords, k = 1)
nearest_cells <- reference_cells %>% dplyr::slice(nn$nn.index[,1])

# Attach nearest neighbor info to query_cells
query_cells$nn_distance <- nn$nn.dist[,1]
query_cells$nn_type <- nearest_cells$subclass

# ------------------------------
# 4. Build nn_by_subtype for plotting/stats
# ------------------------------
nn_by_subtype <- query_cells %>%
  st_drop_geometry() %>%
  dplyr::select(id, nn_type, nn_distance) %>%
  dplyr::rename(ref_subtype = nn_type)

# Summarize distances
summary_by_subtype <- nn_by_subtype %>%
  group_by(ref_subtype) %>%
  summarise(
    mean_distance = mean(nn_distance),
    median_distance = median(nn_distance),
    n = n(),
    .groups = "drop"
  )

# ------------------------------
# 5. Plots and statistical tests
# ------------------------------

ggplot(summary_by_subtype, aes(x = reorder(ref_subtype, n), 
                               y = median_distance, 
                               size = n)) +
  geom_point(color = "#E69F00", alpha = 1) +
  coord_flip() +
  labs(
    x = "Reference Subtype",
    y = paste("Median Distance to", cell_type, "(um)"),
    size = paste("Number of", cell_type, "cells"),
    title = paste("Median Distance from", cell_type, "to Each Subtype")
  ) +
  theme_minimal(base_size = 14) +
  theme(axis.text.y = element_text(face = "bold"),
        axis.title = element_text(face = "bold"))

5.9.2 Let’s explore a new Population!: EC’s

df <- xenium.obj@meta.data

# ------------------------------
# 0. Define query cell type
# ------------------------------
cell_type <- "EC"  # Change as needed

xenium_pc <- subset(xenium.obj, subset = subclass == cell_type)

# ------------------------------
# 1. Prepare SF coordinates
# ------------------------------
df$x_centroid <- as.numeric(as.character(df$x_centroid))
df$y_centroid <- as.numeric(as.character(df$y_centroid))
df$id <- seq_len(nrow(df))
cells_sf <- st_as_sf(df, coords = c("x_centroid", "y_centroid"), crs = NA)

# ------------------------------
# 2. Query and reference cells
# ------------------------------
query_cells <- cells_sf %>% filter(subclass == cell_type)
reference_cells <- cells_sf %>% filter(subclass != cell_type)

# Extract coordinates
query_coords <- do.call(rbind, st_geometry(query_cells)) %>% as.matrix()
ref_coords <- do.call(rbind, st_geometry(reference_cells)) %>% as.matrix()

# ------------------------------
# 3. Nearest neighbor search (k = 1)
# ------------------------------
nn <- get.knnx(ref_coords, query_coords, k = 1)
nearest_cells <- reference_cells %>% dplyr::slice(nn$nn.index[,1])

# Attach nearest neighbor info to query_cells
query_cells$nn_distance <- nn$nn.dist[,1]
query_cells$nn_type <- nearest_cells$subclass

# ------------------------------
# 4. Build nn_by_subtype for plotting/stats
# ------------------------------
nn_by_subtype <- query_cells %>%
  st_drop_geometry() %>%
  dplyr::select(id, nn_type, nn_distance) %>%
  dplyr::rename(ref_subtype = nn_type)

# Summarize distances
summary_by_subtype <- nn_by_subtype %>%
  group_by(ref_subtype) %>%
  summarise(
    mean_distance = mean(nn_distance),
    median_distance = median(nn_distance),
    n = n(),
    .groups = "drop"
  )

# ------------------------------
# 5. Plots and statistical tests
# ------------------------------

ggplot(summary_by_subtype, aes(x = reorder(ref_subtype, n), 
                               y = median_distance, 
                               size = n)) +
  geom_point(color = "#E69F00", alpha = 1) +
  coord_flip() +
  labs(
    x = "Reference Subtype",
    y = paste("Median Distance to", cell_type, "(um)"),
    size = paste("Number of", cell_type, "cells"),
    title = paste("Median Distance from", cell_type, "to Each Subtype")
  ) +
  theme_minimal(base_size = 14) +
  theme(axis.text.y = element_text(face = "bold"),
        axis.title = element_text(face = "bold"))

5.9.3 DEGs by Neighbor

# Step 1: Extract the nearest neighbor info for query cells
nn_info <- query_cells %>%
  st_drop_geometry() %>%
  dplyr::select(id, nn_type, nn_distance)

# Step 2: Initialize new columns with correct types
xenium.obj@meta.data$nn_type <- as.character(NA)   # character column
xenium.obj@meta.data$nn_distance <- NA_real_       # numeric column

# Step 3: Assign nearest neighbor info
xenium.obj@meta.data$nn_type[query_cells$id] <- as.character(nn_info$nn_type)
xenium.obj@meta.data$nn_distance[query_cells$id] <- nn_info$nn_distance

# Subset as usual
xenium_sub <- subset(xenium.obj, subset = subclass == cell_type)

head(xenium_sub@meta.data)
Idents(xenium_sub) <- xenium_sub@meta.data$nn_type

df3 <- FindMarkers(xenium_sub, ident.1 = c("PC"), min.pct = .25)

VlnPlot(xenium_sub, "Aqp2", group.by = "nn_type")

VlnPlot(xenium_sub, "Cryab", group.by = "nn_type")

5.10 Questions?

5.11 Community Questions

Do you have a coding problem that you’d like some support on?
Do you have a topic you’d like covered at a future meeting?

Email me:

5.12 Upcoming Schedule

5.13 Brainstorming Upcoming Topics

5.13.1 Potential Future Topics on Xenium Analysis

5.13.1.1 Spatial Cell Segmentation Algorithms

5.13.1.2 Label Transfer between snRNAseq and Xenium

5.13.1.3 Integration of Xenium and snRNAseq Datasets

5.13.1.4 Comparing Xenium and snRNAseq DEG Analysis

6 Session Info

sessionInfo()
## R version 4.5.1 (2025-06-13 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 11 x64 (build 26200)
## 
## Matrix products: default
##   LAPACK version 3.12.1
## 
## locale:
## [1] LC_COLLATE=English_United States.utf8 
## [2] LC_CTYPE=English_United States.utf8   
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.utf8    
## 
## time zone: America/Los_Angeles
## tzcode source: internal
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] openxlsx_4.2.8              patchwork_1.3.2            
##  [3] cowplot_1.2.0               glmGamPoi_1.20.0           
##  [5] sctransform_0.4.2           RColorBrewer_1.1-3         
##  [7] kableExtra_1.4.0            FNN_1.1.4.1                
##  [9] rstatix_0.7.2               sf_1.0-24                  
## [11] qs_0.27.3                   scCustomize_3.2.0          
## [13] styler_1.10.3               lubridate_1.9.4            
## [15] forcats_1.0.0               stringr_1.5.1              
## [17] purrr_1.1.0                 readr_2.1.5                
## [19] tidyr_1.3.1                 tibble_3.3.0               
## [21] tidyverse_2.0.0             DESeq2_1.48.2              
## [23] SummarizedExperiment_1.38.1 Biobase_2.68.0             
## [25] MatrixGenerics_1.20.0       matrixStats_1.5.0          
## [27] GenomicRanges_1.60.0        GenomeInfoDb_1.44.2        
## [29] IRanges_2.42.0              S4Vectors_0.46.0           
## [31] BiocGenerics_0.54.0         generics_0.1.4             
## [33] EnhancedVolcano_1.26.0      ggrepel_0.9.6              
## [35] here_1.0.1                  BiocManager_1.30.26        
## [37] ggplot2_4.0.1               knitr_1.50                 
## [39] Seurat_5.3.0                SeuratObject_5.2.0         
## [41] sp_2.2-0                    dplyr_1.1.4                
## 
## loaded via a namespace (and not attached):
##   [1] spatstat.sparse_3.1-0   httr_1.4.7              tools_4.5.1            
##   [4] backports_1.5.0         R6_2.6.1                mgcv_1.9-3             
##   [7] lazyeval_0.2.2          uwot_0.2.3              withr_3.0.2            
##  [10] gridExtra_2.3           progressr_0.15.1        cli_3.6.5              
##  [13] textshaping_1.0.3       spatstat.explore_3.5-2  fastDummies_1.7.5      
##  [16] labeling_0.4.3          sass_0.4.10             S7_0.2.0               
##  [19] spatstat.data_3.1-8     proxy_0.4-29            ggridges_0.5.7         
##  [22] pbapply_1.7-4           systemfonts_1.2.3       svglite_2.2.1          
##  [25] R.utils_2.13.0          dichromat_2.0-0.1       parallelly_1.45.1      
##  [28] rstudioapi_0.17.1       shape_1.4.6.1           RApiSerialize_0.1.4    
##  [31] vroom_1.6.5             ica_1.0-3               spatstat.random_3.4-1  
##  [34] car_3.1-3               zip_2.3.3               Matrix_1.7-3           
##  [37] ggbeeswarm_0.7.2        abind_1.4-8             R.methodsS3_1.8.2      
##  [40] lifecycle_1.0.4         yaml_2.3.10             snakecase_0.11.1       
##  [43] carData_3.0-5           SparseArray_1.8.1       Rtsne_0.17             
##  [46] paletteer_1.6.0         grid_4.5.1              promises_1.3.3         
##  [49] crayon_1.5.3            miniUI_0.1.2            lattice_0.22-7         
##  [52] beachmat_2.24.0         pillar_1.11.0           future.apply_1.20.0    
##  [55] codetools_0.2-20        glue_1.8.0              spatstat.univar_3.1-4  
##  [58] data.table_1.17.8       vctrs_0.6.5             png_0.1-8              
##  [61] spam_2.11-1             gtable_0.3.6            rematch2_2.1.2         
##  [64] cachem_1.1.0            xfun_0.53               S4Arrays_1.8.1         
##  [67] mime_0.13               survival_3.8-3          units_1.0-0            
##  [70] fitdistrplus_1.2-4      ROCR_1.0-11             nlme_3.1-168           
##  [73] bit64_4.6.0-1           RcppAnnoy_0.0.22        rprojroot_2.1.1        
##  [76] R.cache_0.17.0          bslib_0.9.0             irlba_2.3.5.1          
##  [79] vipor_0.4.7             KernSmooth_2.23-26      colorspace_2.1-1       
##  [82] DBI_1.2.3               ggrastr_1.0.2           tidyselect_1.2.1       
##  [85] bit_4.6.0               compiler_4.5.1          xml2_1.4.0             
##  [88] DelayedArray_0.34.1     plotly_4.11.0           stringfish_0.17.0      
##  [91] scales_1.4.0            classInt_0.4-11         lmtest_0.9-40          
##  [94] digest_0.6.37           goftest_1.2-3           presto_1.0.0           
##  [97] spatstat.utils_3.1-5    rmarkdown_2.29          XVector_0.48.0         
## [100] htmltools_0.5.8.1       pkgconfig_2.0.3         fastmap_1.2.0          
## [103] rlang_1.1.6             GlobalOptions_0.1.2     htmlwidgets_1.6.4      
## [106] UCSC.utils_1.4.0        shiny_1.11.1            farver_2.1.2           
## [109] jquerylib_0.1.4         zoo_1.8-14              jsonlite_2.0.0         
## [112] BiocParallel_1.42.2     R.oo_1.27.1             magrittr_2.0.3         
## [115] Formula_1.2-5           GenomeInfoDbData_1.2.14 dotCall64_1.2          
## [118] Rcpp_1.1.0              reticulate_1.43.0       stringi_1.8.7          
## [121] MASS_7.3-65             plyr_1.8.9              parallel_4.5.1         
## [124] listenv_0.9.1           deldir_2.0-4            splines_4.5.1          
## [127] tensor_1.5.1            hms_1.1.3               circlize_0.4.16        
## [130] locfit_1.5-9.12         igraph_2.1.4            spatstat.geom_3.5-0    
## [133] RcppHNSW_0.6.0          reshape2_1.4.4          evaluate_1.0.5         
## [136] RcppParallel_5.1.11-1   ggprism_1.0.7           tzdb_0.5.0             
## [139] httpuv_1.6.16           RANN_2.6.2              polyclip_1.10-7        
## [142] future_1.67.0           scattermore_1.2         janitor_2.2.1          
## [145] broom_1.0.9             xtable_1.8-4            e1071_1.7-17           
## [148] RSpectra_0.16-2         later_1.4.4             viridisLite_0.4.2      
## [151] class_7.3-23            beeswarm_0.4.0          cluster_2.1.8.1        
## [154] timechange_0.3.0        globals_0.18.0